-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
System image compression with zstd #59227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/staticdata.c
Outdated
jl_dlsym(handle, "jl_image_pointers", (void**)&image->pointers, 1); | ||
|
||
image->size = ZSTD_getFrameContentSize(data, *plen); | ||
image->data = (char *)malloc(image->size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to mmap this with huge pages/large pages
The savings are really nice here, but is there a way to claw back some of the startup time? Thinking out aloud, is it worth looking at whether Zstd has support for AVX512 (or fancy instructions) to speed up that may not be enabled? |
Currently I'm testing lz4 as an alternative compression algorithm that sacrifices some compression ratio for decompression speed, since it is intended to be about as fast as RAM on modern CPUs. @vtjnash also had some ideas about doing decompression and relocation in a single pass that I'd like to try: in this version we touch a whole bunch of pages while decompressing, and then force them all back into cache later, when performing relocations. |
I tried a simple test with the command line zstd and lz4 (so may not be representative) and they took basically the same amount of time but zstd compression was much better. So much better that I suspect the time was made up by reading less data. Relocating while decompressing sounds awesome if we can pull that off. |
I believe we should use lz4hc, which is quite slow to compress but has similar rations to zstd (while decompressing about 2.5x faster) |
Can we use threads for compressing/decompressing? |
Experimenting with compressing on |
I think they technically go in ldata currently. But not sure if rodata helps the OS in any meaningful way except write protection |
Co-authored-by: Gabriel Baraldi <[email protected]>
Even without compression, this gives about an 8% improvement in load times.
7a0de7a
to
0ee175b
Compare
@@ -160,6 +160,7 @@ JL_DLLEXPORT void jl_init_options(void) | |||
0, // task_metrics | |||
-1, // timeout_for_safepoint_straggler_s | |||
0, // gc_sweep_always_full | |||
0, // compress_sysimage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revisit this default, but fine for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge this as an MVP that will let us see how bad the startup costs are in practice. IMO we should revisit multithreaded (de)compression and lz4hc before enabling it by default.
LGTM default aside |
Should we do a pkgeval run on this one? |
I don't think a PkgEval run is really relevant for this PR. |
Revived version of #48244, with a slightly different approach. This version looks for a function pointer called
jl_image_unpack
inside compiled system images and invokes it to get thejl_image_buf_t
struct. Two implementations,jl_image_unpack_zstd
andjl_image_unpack_uncomp
are provided (for comparison). The zstd compression is applied only to the heap image, and not the compiled code, since that can be shared across Julia processes.TODO: test a few different compression settings and enable by default.
Example data from un-trimmed juliac "hello world":